ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010 



Intrusion Detection using C4.5: Performance 
Enhancement by Classifier Combination 

Manasi Gyanchandani 1 , R. N. Yadav 2 , J. L. Rana 3 



Dept. of Information Technology MANIT Bhopal 

manasi_gyanchandani@yahoo.co.in 
Dept of Electronics and Communication MANIT Bhopal 

myadav @ gmail.com 
Ex-HOD, Dept. of CS/IT, MANIT, Bhopal 
j l_rana @y ahoo .com. 



Abstract: Data Security has become a very critical part of any 
organizational information system. Intrusion Detection 
System (IDS) is used as a security measure to preserve data 
integrity and system availability from various attacks. This 
paper evaluates the performance of C4.5 classifier and its 
combination using bagging, boosting and stacking over NSL- 
KDD dataset for IDS. This dataset set consists of selected 
records of the complete KDD dataset. 

I. INTRODUCTION 

Our lives have drastically changed due to information 
technology, at the same time we are completely dependent 
on technology that is vulnerable to attacks. Because of 
these attacks the confidentiality, integrity and availability 
of the information may be lost. It is estimated that these 
attacks are costing tens or even hundreds of millions of 
dollars each year. The numbers of attacks are even 
doubling each year. These attacks may cause a serious 
threat to national security. IDS were designed to monitor 
attacks and generate alarms whenever certain abnormal 
activities are detected. 

IDSs can be categorized based on which events they 
monitor, the way they collect information that an intrusion 
has occurred. IDSs that critically analysis data circulating 
on the network are called as Network based IDS (NIDSs) 
and IDS that reside on the host and collect logs of 
operating system-related events are called as Host based 
IDS (HIDSs) [3] [8]. 

Two types of Intrusion Detection techniques exist based 
on the method of inspecting the traffic: 

• Signature based IDS 

• Statistical anomaly based IDS. 

In signature based IDS, also known as misuse detection, 
signatures of known attacks are stored and the events are 
matched against the stored signatures. It will signal an 
intrusion if a match is found. The main drawback with this 
method is that it cannot detect new attacks whose 
signatures are unknown. This means that an IDS using 
misuse detection will only detect known attacks or attacks 
that are similar enough to a known attack to match its 
signature [3]. Statistical anomaly based intrusion detection 
has attracted many academic researchers due to its potential 
for addressing novel attacks. The researchers have found 
that several machine learning algorithms have a very high 
detection rate while keeping a low false alarm rate. 
Anomaly detection applied to intrusion detection and 



computer security has been an active area of research since 
it was originally proposed in [8]. 

Initially KDDCUP '99 dataset was used for IDS but it 
has some inherent problems. The important problem was a 
huge number of redundant records; about 78% and 75% of 
the records are duplicate in the train and test set 
respectively. This large amount of redundant records in the 
train set will cause learning algorithms to be biased 
towards the more frequent records, and thus prevent it from 
learning unfrequent records which are usually more 
harmful to networks such as U2R and R2L attacks. The 
other problem with duplicated records in the test set will 
cause the evaluation results to be biased by the methods 
which have better detection rates on the frequent records 
.To remove these problems a new data set was proposed, 
NSL-KDD [1]. 

C4.5 which is an extension of ID3 algorithm [11] is an 
algorithm used to generate a decision tree. The decision 
trees generated by C4.5 can be used for classification, and 
for this reason, it is often referred to as a statistical 
classifier .The results will vary significantly if training data 
is changed. This variation is known as error due to variance 
that can be minimized using various classifier 
combinations. 

Section II presents the description of dataset used. 
Section III describes the various classifier combination 
techniques. Section IV provides the experimental results 
and discussion. Section V concludes the paper. 

II. DATA SET DESCRIPTION 

Mostly all the experiments on intrusion detection are 
done on KDDCUP '99 dataset, which is a subset of the 
1998 DARPA Intrusion Detection Evaluation data set, and 
is processed, extracting 41 features from the raw data of 
DARPA 98 data set. [4] defined higher-level features that 
help in distinguishing between "good" normal connections 
from "bad" connections (attacks). This data can be used to 
test both host based and network based systems, and both 
signature and anomaly detection systems. A connection is 
a sequence of Transmission Control Protocol (TCP) 
packets starting and ending with well defined times, 
between which data flows from a source IP address to a 
target IP address under some well defined protocol. Each 
connection is labeled as normal, or as an attack, with 
exactly one specific attack type. Each connection record 
consists of about 100 bytes [9]. Some of the basics features 
of individual TCP connection are listed in Table I . 
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TABLE I 
BASIC FEATURES OF INDIVIDUAL TCP CONNECTIONS 



Feature 
name 


Description 


Type 


Duration 


length (number of seconds) of the 
connection 


Contin 
uous 


Protocol 
Type 


type of the protocol, e.g. tcp, udp, etc. 


Discret 
e 


Service 


network service on the destination, e.g., 
http, telnet, etc. 


Discret 
e 


src_byte 

s 


number of data bytes from source to 
destination 


Contin 
uous 


dst_byte 

s 


number of data bytes from destination to 
source 


Contin 
uous 


Flag 


normal or error status of the connection 


Discret 
e 


Land 


1 if connection is from/to the same 
host/port; otherwise 


Discret 
e 


wrong_fr 
agment 


number of vv wrong" fragments 


Contin 
uous 


Urgent 


number of urgent packets 


Contin 
uous 



A. NSL-KDD 

KDD train and test set consists of huge number of 
redundant records. Almost about 78% and 75% of the 
records are duplicated in the train and test set respectively. 
This may cause the classification algorithms to be biased 
towards these redundant records and thus prevent it from 
classifying the other records (which are not duplicate).To 
solve this problem, a new dataset was developed NSL- 
KDD. All the repeated records in the entire KDD train and 
test set were removed, and only one copy of each record 
was kept. Tables II and III show the statistical analysis of 
the reduction of repeated records in the KDD train and test 
sets, respectively, [1]. 

TABLE II 

STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE 

KDD TRAIN SET 





Original Records 


Distinct Records 


Reduction 
rate 


Attacks 


3,925,650 


262,178 


93.32% 


Normal 


972,781 


812,814 


16.44% 


Total 


4,898,431 


1,074,992 


78.05% 



TABLE III 

STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE 

KDD TEST SET 





Original Records 


Distinct Records 


Reduction rate 


Attacks 


250,436 


29,378 


88.26% 


Normal 


60,591 


47,911 


20.92% 


Total 


311,027 


77,289 


75.15% 



B. Evaluation Metrics 

Metrics which are mainly used to evaluate the 
performance of classifier are present in [6] [2] and are 
given here for ready reference. 

• The true positives (TP) and true negatives (TN) are 
correct classifications. True positive is the probability 
that there is an alert, when there is an intrusion. It is 
calculated as below. 



TPR = TP/(TP+FN) 

• A false positive (FP) occurs when the outcome is 
incorrectly predicted as yes (or positive) when it 
is actually no (negative). It is calculated as 
below. 

FPR = FP / (TN + FP) 

• A false negative (FN) occurs when the outcome 
is incorrectly predicted as negative when it is 
actually positive. 

• Recall: The percentage of the total relevant 
documents in a database retrieved by your 
search. If it is known that there were 1000 
relevant documents in a database and search 
retrieved 100 of these relevant documents, the 
recall would be 10%. It is calculated as below. 

Recall =TP / (TP+FN) 

• Precision: The percentage of relevant documents 
in relation to the number of documents retrieved. 
If search retrieves 100 documents and 20 of 
these are relevant, the precision is 20%. It is 
calculated as below. 

Precision=TP / (TP+FP) 

• The overall success rate is the number of correct 
classifications divided by the total number of 
classifications. 

Success rate = (TP+TN) / (TP+TN+FP+FN) 
Error Rate = 1- Success rate 

• In a multiclass prediction, the result on a test set is 
often displayed as a two dimensional confusion 
matrix with a row and column for each class. Each 
matrix element shows the number of test examples 
for which the actual class is the row and the 
predicted class is the column. Good results 
correspond to large numbers down the main 
diagonal and small, ideally zero, off-diagonal 
elements.The confusion Matrix is formed based on 
the Table IV. 

TABLE IV 
CONFUSION MATRIX 





Predicted Class 


Actual 
Class 




Attack 


Normal 


Attack 


TP 


FN 


Normal 


FP 


TN 



III.CLASSIFIER COMBINATION TECHNIQUES 

Classifier combination technique can be used to reduce 
the error due to variance. In order to make decisions in 
intrusion detection more reliable , the output of different 
models can be combined. Several machine learning 
techniques do this by learning an ensemble of models and 
using them in combination , Bagging, Boosting, and 
Stacking are most efficient among them. These models can 
increase the predictive performance over a single model 
and can also be applied to numeric prediction problems and 
classification tasks. The performance of these three models 
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is good. An ensemble of classifiers is a set of classifiers 
whose individual decisions are combined to classify new 
examples. The purpose of combining classifiers consists on 
improving the accuracy of a single classifier [10]. 

A. Bagging: 

The Bootstrap aggregating algorithm generates different 
classifiers from different bootstrap samples and combines 
decisions from the different classifiers into a single 
prediction by voting (the class that gets more votes from 
the classifiers wins). 

B. Boosting: 

Another method to construct an ensemble of classifiers 
is know as boosting, which is used to boost the 
performance of a weak learner. A weak learner is a simple 
classifier whose error is less than 50% on training 
instances. The models which are more successful will be 
assigned with more weight as compared to other models. 
Here each new model is influenced by the performance of 
previously built model. 

Thus boosting can built a powerful combined classifier 
from very simple learning methods. It can convert these 
simple learning methods called as weak learners into strong 
ones. It produces classifiers that are more accurate on fresh 
data than ones generated by bagging. But it sometimes fails 
in practical situations: It generate a classifier that is less 
accurate than a single classifier from the same data [7]. 

C, Stacking: 

Stacking is the abbreviation to refer to Stacked 
Generalization. Unlike bagging and boosting, it uses 
different learning algorithms to generate the ensemble of 
classifiers. The main idea of stacking is to combine 
classifiers from different learners such as decision trees, 
instance-based learners, etc. 

Since each one uses different knowledge representation 
and different learning biases, the hypothesis space will be 
explored differently, and different classifiers will be 
obtained. Thus, it is expected that they will not be 
correlated. 

Once the classifiers have been generated, they must be 
combined. Unlike bagging and boosting, stacking does not 
use a voting system because, for example, if the majority of 
the classifiers make bad predictions this will lead to a final 
bad classification. To solve this problem, stacking uses the 
concept of Meta learner. [10] The Meta learner (or level- 1 
model), tries to learn, using a learning algorithm, how the 
decisions of the base classifiers (or level-0 models) should 
be combined . 

IV RESULTS AND DISCUSSION 

In order to reduce the error due to variance classifier 
combinations are used. Initially C4.5 classifier is applied 
over NSL-KDD dataset. NSL-KDD contains 125973 
records in the train set and 22544 records in the test set. To 
improve the performance of C4.5 classifier over NSL-KDD 
dataset, classifier combinations techniques: bagging, 
boosting and stacking are used. 



It was found that for the normal class, as shown in the 
Table V, bagging gives the better result. The recall was 
found to be 0.719 for bagging and it was 0.708 for C4.5, 
both having the same precision value (0.973). While for the 
anomaly class as shown in Table VI, both recall and 
precision have higher values for bagging. 

TABLE V 
PERFORMANCE METRICS FOR NORMAL CLASS 





Bagging 


Boosting 


Stacking 


C4.5 


TP 


0.973 


0.957 


0.974 


0.973 


FP 


0.288 


0.346 


0.326 


0.304 


Recall 


0.719 


0.677 


0.693 


0.708 


Precision 


0.973 


0.957 


0.974 


0.973 



TABLE VI 
PERFORMANCE METRICS FOR ANOMALY CLASS 





Bagging 


Boosting 


Stacking 


C4.5 


TP 


0.712 


0.654 


0.674 


0.696 


FP 


0.027 


0.043 


0.026 


0.027 


Recall 


0.972 


0.953 


0.971 


0.971 


Precision 


0.712 


0.654 


0.674 


0.696 



V CONCLUSIONS 

Error due to variance has been reduced using classifier 
combinations thus increasing the performance of the 
classification using the NSL-KDD dataset. Out of the three 
classifiers Bagging provides better results. NSL-KDD 
dataset can be used for performance evaluation for 5- 
classes (normal, dos, probe, u2r and r21) instead of 2- 
classes. Further performance can be improved by reducing 
the features as given in [12]. 

Different set of features are used for different class. 
More classification algorithm and its combination can be 
used on NSL-KDD dataset 
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